37 research outputs found

    Classroom discourse in foreign language teacher education

    Get PDF
    A análise das gravações de miniaulas ministradas por professores de língua estrangeira em formação permitiu verificar a carência de um léxico específico para a situação de sala de aula. Essa evidência revela que o desenvolvimento de uma proficiência linguística geral não é suficiente na formação inicial de professores de línguas estrangeiras: é necessário que se realize também um trabalho sistemático sobre a linguagem específica de sala de aula. Com o objetivo de contribuir para a formação desses professores, este artigo apresenta um inventário de falas típicas de sala de aula elaborado a partir das dificuldades observadas no corpus de estudo. Propõe-se utilizar esse inventário como base para atividades que proporcionem ao professor em formação a oportunidade de adquirir proficiência lexical específica para sua prática profissional em língua estrangeira.The analysis of minilessons given by future language teachers revealed their lack of lexical proficiency in foreign language for classroom situation. Such evidence shows it is not sufficient to develop general linguistic competence in foreign language teacher education. It's necessary systematically introduce practices focusing the acquisition of the language for the specific purpose of classroom interaction. Aiming at the improvement of teacher education, this paper presents a list of typical classroom talk based on difficulties observed in the corpus of this research. It is proposed to apply such list as a starting point for activities that enable teachers to acquire specific lexical proficiency for their professional practice in foreign language

    Filling the gap: inserting an artificial constituent where a subject is omitted in Portuguese

    Get PDF
    This paper reports the first efforts to insert null elements to represent omitted subjects in Portuguese. Our aim is to fill some gaps in the syntactic structure in order to facilitate the assignment of semantic role labels and thus provide a better training corpus for SRL classifiers. The main advantage of inserting such null elements is to reduce data sparsity, as all the verbal clauses become similar in what concerns the presence of explicit subjects. The results show a better precision in the insertion of null elements related to subjects of verbs inflected in the first person, both singular and plural.Samsung Eletrônica da Amazônia Ltda

    O tratamento de marcadores discursivos em uma ferramenta de apoio à escrita acadêmica em português para nativos de espanhol

    Get PDF
    We report in this paper the development of a module dedicated to discourse markers in HABLA (Hispanofalantes Purchasing an Academic Lin-guistic Base), a tool designed to support native Spanish speakers in their aca-demic writing in Portuguese. HABLA is conceived to meet the needs of native Spanish speakers who are enrolled in Brazilian federal and state institutions and must write a dissertation or thesis in Portuguese. The diagnosis of difficulties faced by the learners in the use of discourse markers is based on the analysis of a learners’ corpus. Part of these difficulties are addressed by two procedures al-ready implemented that identify the problems automatically and present sugges-tions. The development of the module encompasses the compilation of a bilin-gual lexicon of discourse markers – Spanish-Portuguese - as well as a list of false friends discourse markers

    Generating a lexicon of errors in Portuguese to support an error identification system for Spanish native learners

    Get PDF
    Portuguese is a less resourced language in what concerns foreign language learning. Aiming to inform a module of a system designed to support scientific written production of Spanish native speakers learning Portuguese, we developed an approach to automatically generate a lexicon of wrong words, reproducing language transfer errors made by such foreign learners. Each item of the artificially generated lexicon contains, besides the wrong word, the respective Spanish and Portuguese correct words. The wrong word is used to identify the interlanguage error and the correct Spanish and Portuguese forms are used to generate the suggestions. Keeping control of the correct word forms, we can provide correction or, at least, useful suggestions for the learners. We propose to combine two automatic procedures to obtain the error correction: i) a similarity measure and ii) a translation algorithm based on aligned parallel corpus. The similarity-based method achieved a precision of 52%, where as the alignment-based method achieved a precision of 90%. In this paper we focus only on interlanguage errors involving suffixes that have different forms in both languages. The approach, however, is very promising to tackle other types of errors, such as gender errors.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP

    As funções da definição nos dicionários bilíngües

    Get PDF
    Definições adequadas têm sido um desafio para os lexicógrafos há séculos. A busca de padrões de definição revelou que as exigências variam em função da classe gramatical e da freqüência das palavras. Após o advento dos dicionários bilíngües que apresentam definições, mais popularmente conhecidos como “semibilíngües”, pode-se observar que a definição tem, nos dicionários bilíngües, papéis distintos dos que tem nos dicionários monolíngües. Entender esses papéis é um dos fatores que capacitam o lexicógrafo a incluir, nos dicionários bilíngües, definições que atendam as necessidades do público a que se destinam

    Automatic generation of a lexical resource to support semantic role labeling in portuguese

    Get PDF
    This paper reports an approach to automatically generate a lexical resource to support incremental semantic role labeling annotation in Portuguese. The data come from the corpus Propbank-Br (Propbank of Brazilian Portuguese) and from the lexical resource of English Propbank, as both share the same structure. In order to enable the strategy, we\ud added extra annotation to Propbank-Br. This approach is part of a previous decision to invert the process of implementing a Propbank project, by first annotating a core corpus and only then generating a lexical resource to enable further annotation tasks. The reasoning behind such inversion is to explore the task empirically before distributing the annotation task and to provide simultaneously: 1) a first training corpus for SRL in Brazilian Portuguese and 2) annotated examples to composse a lexical resource to support SRL. The main contribution of this paper is to point out to what extent linguistic effort may be reduced, thereby speeding up the construction of a lexical resource to support SRL for less resourced languages. The corpus Propbank-Br, with the extra annotation described herein, is publicly available.Samsung Eletrônica da Amazônia LtdaFAPESPCAPES (process number 151/2013

    A importância dos falsos homógrafos para a correção automática de erros ortográficos em português

    Get PDF
    This paper reports the analysis of 25.722 pairs of Portuguese words that differ from each other by a single diacritic, called “false homographs”. Such words are relevant for spelling correction, as in these cases a misspelled word missing a diacritic is identical to a correct word, consequently preventing the identification and the correction of the misspelling. The purpose of the analysis is to identify and to exclude, from the lexicon used by a Portuguese speller, non-accented words that are relatively less frequent than their respective accented pairs. This action is specially justified when one aims to correct User-Generated Content (UGC), a kind of text characterized by missing diacritics, among other features. The result is a list of 2.052 words that fit the requirements of the aimed strategy.Este artigo relata a análise de 25.722 pares de palavras em português que só diferem por um acento. Essas palavras são denominadas aqui de “falsos homógrafos” e são relevantes para a correção de erros ortográficos, pois nesses casos uma palavra incorreta à qual falta um acento é idêntica a uma forma correta na língua, o que impede a identificação do erro e sua consequente correção. O propósito da análise é identificar pares em que a forma não acentuada tenha baixa frequência e a forma acentuada tenha alta frequência, e assim excluir, do léxico que servirá de base para o corretor ortográfico, as formas pouco frequentes. Essa proposta justifica-se especialmente quando se almeja a correção ortográfica de Conteúdo Gerado por Usuários na web (CGU), um tipo de texto caracterizado, entre outras coisas, pela falta de acentos. O resultado é uma lista de 2.052 palavras que atendem às condições da estratégia pretendida.Samsung Eletrônica da Amazônia Ltd

    A large corpus of product reviews in Portuguese: tackling out-of-vocabulary words

    Get PDF
    Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes.University of São PauloSamsung Eletrônica da Amazônia LtdaFAPESPCNP